The purpose of the report is to aggregate and examine selected techniques of imputation of missing data in the context of their impact on the prediction efficiency of classification algorithms. The following considerations include various imputation techniques, both basic (median / mode imputation) and more sophisticated (selected methods from the mice, VIM, missRanger and softImpute packages).
For testing purposes, as the classification algorithm, we used the ranger algorithm, which is a fast implementation of random forest, particularly suited for high dimensional data. The prediction effectiveness was assessed in relation to the AUC, balanced accuracy and Matthews correlation coefficient measures.

The report contains, all the results, grouped by both: package and dataset.

basic (median/mode)

adult

Crossvalidation results

Imputation times

## Imputation time:  0.1

Test set results

## Test set AUC:  0.916
## Test set BACC:  0.78
## Test set MCC:  0.602

Missings overview

eucalyptus

Crossvalidation results

Imputation times

## Imputation time:  0.007

Test set results

## Test set AUC:  0.955
## Test set BACC:  0.888
## Test set MCC:  0.784

Missings overview

dresses-sales

Crossvalidation results

Imputation times

## Imputation time:  0.009

Test set results

## Test set AUC:  0.56
## Test set BACC:  0.516
## Test set MCC:  0.036

Missings overview

credit-approval

Crossvalidation results

Imputation times

## Imputation time:  0.009

Test set results

## Test set AUC:  0.932
## Test set BACC:  0.866
## Test set MCC:  0.729

Missings overview

sick

Crossvalidation results

Imputation times

## Imputation time:  0.041

Test set results

## Test set AUC:  0.996
## Test set BACC:  0.927
## Test set MCC:  0.898

Missings overview

SpeedDating

Crossvalidation results

Imputation times

## Imputation time:  0.123

Test set results

## Test set AUC:  1
## Test set BACC:  0.986
## Test set MCC:  0.983

Missings overview

cylinder-bands

Crossvalidation results

Imputation times

## Imputation time:  0.015

Test set results

## Test set AUC:  0.933
## Test set BACC:  0.864
## Test set MCC:  0.774

Missings overview

missRanger

adult

Crossvalidation results

Imputation times

## Imputation time:

Test set results

## Test set AUC:  0.917
## Test set BACC:  0.78
## Test set MCC:  0.605

Missings overview

eucalyptus

Crossvalidation results

Imputation times

## Imputation time:

Test set results

## Test set AUC:  0.959
## Test set BACC:  0.896
## Test set MCC:  0.796

Missings overview

dresses-sales

Crossvalidation results

Imputation times

## Imputation time:

Test set results

## Test set AUC:  0.547
## Test set BACC:  0.508
## Test set MCC:  0.018

Missings overview

credit-approval

Crossvalidation results

Imputation times

## Imputation time:

Test set results

## Test set AUC:  0.927
## Test set BACC:  0.873
## Test set MCC:  0.741

Missings overview

sick

Crossvalidation results

Imputation times

## Imputation time:

Test set results

## Test set AUC:  0.997
## Test set BACC:  0.917
## Test set MCC:  0.886

Missings overview

SpeedDating

Crossvalidation results

Imputation times

## Imputation time:

Test set results

## Test set AUC:  1
## Test set BACC:  0.98
## Test set MCC:  0.976

Missings overview

cylinder-bands

Crossvalidation results

Imputation times

## Imputation time:

Test set results

## Test set AUC:  0.942
## Test set BACC:  0.864
## Test set MCC:  0.774

Missings overview

VIM (knn)

adult

Crossvalidation results

Imputation times

## Imputation time:  95.437

Test set results

## Test set AUC:  0.916
## Test set BACC:  0.778
## Test set MCC:  0.601

Missings overview

eucalyptus

Crossvalidation results

Imputation times

## Imputation time:  0.33

Test set results

## Test set AUC:  0.962
## Test set BACC:  0.895
## Test set MCC:  0.797

Missings overview

dresses-sales

Crossvalidation results

Imputation times

## Imputation time:  0.481

Test set results

## Test set AUC:  0.582
## Test set BACC:  0.535
## Test set MCC:  0.076

Missings overview

credit-approval

Crossvalidation results

Imputation times

## Imputation time:  0.158

Test set results

## Test set AUC:  0.945
## Test set BACC:  0.879
## Test set MCC:  0.754

Missings overview

sick

Crossvalidation results

Imputation times

## Imputation time:  5.582

Test set results

## Test set AUC:  0.996
## Test set BACC:  0.927
## Test set MCC:  0.898

Missings overview

SpeedDating

Crossvalidation results

Imputation times

## Imputation time:  237.384

Test set results

## Test set AUC:  1
## Test set BACC:  0.98
## Test set MCC:  0.976

Missings overview

cylinder-bands

Crossvalidation results

Imputation times

## Imputation time:  1.175

Test set results

## Test set AUC:  0.942
## Test set BACC:  0.876
## Test set MCC:  0.793

Missings overview

VIM (hotdeck)

adult

Crossvalidation results

Imputation times

## Imputation time:  0.087

Test set results

## Test set AUC:  0.916
## Test set BACC:  0.779
## Test set MCC:  0.602

Missings overview

eucalyptus

Crossvalidation results

Imputation times

## Imputation time:  0.049

Test set results

## Test set AUC:  0.963
## Test set BACC:  0.895
## Test set MCC:  0.797

Missings overview

dresses-sales

Crossvalidation results

Imputation times

## Imputation time:  0.054

Test set results

## Test set AUC:  0.622
## Test set BACC:  0.588
## Test set MCC:  0.2

Missings overview

credit-approval

Crossvalidation results

Imputation times

## Imputation time:  0.052

Test set results

## Test set AUC:  0.905
## Test set BACC:  0.856
## Test set MCC:  0.707

Missings overview

sick

Crossvalidation results

Imputation times

## Imputation time:  0.081

Test set results

## Test set AUC:  0.996
## Test set BACC:  0.907
## Test set MCC:  0.874

Missings overview

SpeedDating

Crossvalidation results

Imputation times

## Imputation time:  0.673

Test set results

## Test set AUC:  1
## Test set BACC:  0.984
## Test set MCC:  0.98

Missings overview

cylinder-bands

Crossvalidation results

Imputation times

## Imputation time:  0.116

Test set results

## Test set AUC:  0.933
## Test set BACC:  0.857
## Test set MCC:  0.752

Missings overview

softImpute

adult

Crossvalidation results

Imputation times

## Imputation time:  0.086

Test set results

## Test set AUC:  0.916
## Test set BACC:  0.78
## Test set MCC:  0.602

Missings overview

eucalyptus

Crossvalidation results

Imputation times

## Imputation time:  0.012

Test set results

## Test set AUC:  0.933
## Test set BACC:  0.859
## Test set MCC:  0.716

Missings overview

dresses-sales

Crossvalidation results

Imputation times

## Imputation time:  0.025

Test set results

## Test set AUC:  0.551
## Test set BACC:  0.535
## Test set MCC:  0.076

Missings overview

credit-approval

Crossvalidation results

Imputation times

## Imputation time:  0.011

Test set results

## Test set AUC:  0.931
## Test set BACC:  0.859
## Test set MCC:  0.716

Missings overview

sick

Crossvalidation results

Imputation times

## Imputation time:  0.101

Test set results

## Test set AUC:  0.997
## Test set BACC:  0.917
## Test set MCC:  0.886

Missings overview

SpeedDating

Crossvalidation results

Imputation times

## Imputation time:  1.203

Test set results

## Test set AUC:  1
## Test set BACC:  0.978
## Test set MCC:  0.974

Missings overview

cylinder-bands

Crossvalidation results

Imputation times

## Imputation time:  0.026

Test set results

## Test set AUC:  0.94
## Test set BACC:  0.876
## Test set MCC:  0.793

Missings overview